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Abstract 

Degraded text recognition is a difficult task. Given a 
noisy text image, a word recognizer can be applied to 
generate several word candidates for each word image. 
High-level knowledge sources can then be used to select 
a candidate from the candidate set for each word im- 
age. In this paper, we propose that visual inter-word 
constraints can be used to facilitate candidate selec- 
tion. Visual inter-word constraints provide a way to 
link word images inside the text page, and to interpret 
them systematically. 

Introduction 

The objective of visual text recognition is to transform 
an arbitrary image of text into its symbolic equivalent 
correctly. Recent technical advances in the area of doc- 
ument recognition have made automatic text recogni- 
tion a viable alternative to manual key entry. Given a 
high quality text page, a commercial document recog- 
nition system can recognize the words on the page at 
a high correct rate. However, given a degraded text 
page, such as a multiple-generation photocopy or fac- 
simile, performance usually drops abruptly([l]). 

Given a degraded text image, word images can be ex- 
tracted after layout analysis. A word image from a de- 
graded text page may have touching characters, broken 
characters, distorted or blurred characters, which may 
make the word image difficult to recognize accurately. 
After character recognition and correction based on dic- 
tionary look-up, a word recognizer will provide one or 
more word candidates for each word image. Figure 1 
lists the word candidate sets for the sentence, "P/ease 
fill m the application form." Each word candidate has 
a confidence score, but the score may not be reliable 
because of noise in the image. The correct word candi- 
date is usually in the candidate set, but may not be the 
candidate with the highest confidence score. Instead of 
simply picking up the word candidate with the high- 
est recognition score, which may make the correct rate 
quite low, we need to find a method which can select a 
candidate for each word image so that the correct rate 
can be as high as possible. 

Contextual information and high-level knowledge can 
be used to select a decision word for each word image 
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Figure 1: Candidate Sets for the Sentence: "P/ease fill 
m the application form!" 



in its context. Currently, there are two approaches, 
the statistical approach and the structural approach, 
towards the problem of candidate selection. In the sta- 
tistical approach, language models, such as a Hidden 
Markov Model and word collocation can be utilized for 
candidate selection ([2, 4, 5]). In the structural ap- 
proach, lattice parsing techniques have been developed 
for candidate selection([3, 7]). 

The contextual constraints considered in a statisti- 
cal language model, such as word collocation, are local 
constraints. For a word image, a candidate will be se- 
lected according to the candidate information from its 
neighboring word images in a fixed window size. The 
window size is usually set as 1 or 2. In the lattice pars- 
ing method, a grammar is used to select a candidate 
for each word image inside a sentence so that the se- 
quence of those selected candidates form a grammat- 
ical and meaningful sentence. For example, consider 
the sentence "P/ease fill m the application form" . We 
assume all words except the word ^^form" have been 
recognized correctly and the candidate set for the word 
^^form" is { farm, form, forth, foam, forth } (see the sec- 
ond sentence in Figure 2). The candidate ^^form" can 
be selected easily because word collocation between the 
words ^^application" and ^^form" is strong and the sen- 
tence is grammatical by choosing the candidate ^^form." 

The contextual information inside a small window 
or inside a sentence sometimes may not be enough to 
select a candidate correctly. For example, consider the 



Sentence 1 

I 2 34 56 78 9 10 
This farm is almost the same as that one 
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Sentence 2 

II 12 13 14 15 16 17 
Please fill in the application farm ! 
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Figure 2: Word candidates of two example sen- 
tences(word images 2 and 16 are similar) 



sentence This form is almost the same as thai owe" (see 
the first sentence in Figure 2). The word image 16 
has five candidates: { farm, form, forth, foam, forth 
}. After lattice parsing, the candidate forth" will be 
removed because it does not fit the context. But it 
is difficult to select a candidate from "/arm" , "/orm" 
"/oam" and "/orce" because each of them makes the 
sentence grammatical and meaningful. In such a case, 
more contextual constraints are needed to distinguish 
the remaining candidates and to select one. 

Let's further assume that the sentences in Figure 2 
are from the same text. By image matching, we know 
word images 2 and 16 are visually similar. If two word 
images are almost the same, they must be the same 
word. Therefore, for the word image 2 and the word 
image 16, same candidates must be selected. After the 
word image 16 chooses ^^form" , the image 2 will also 
choose ^^form" as its identity. 
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Note 1: "si" means approximately image matching; 



Note 2: "•" means concatenation. 

Figure 3: Possible Inter- word Relations 

Visual Inter- Word Relations 

A visual inter-word relation can be defined on two word 
images if they share the same pattern at the image level. 
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There are 5 types of visual inter-word relations listed 
in the right part of Figure 3. Figure 4 is a part of a 
scanned text image in which a small number of word 
relations are circled to demonstrate the abundance of 
inter-word relations defined above even in such a small 
fragment of a real text page. Word images 2 and 8 
are almost the same. Word image 9 can match the left 
part of the word image 1 quite well. Word image 5 can 
match a part of the image 6, and so on. 

Visual inter-word relations can be computed by ap- 
plying simple image matching techniques. They can be 
defined in clean text images, as well as in highly de- 
graded text images, because the word images, due to 
their relatively large size, are quite tolerant to noise 
([6]). 

Visual inter-word relations can be used as constraints 
in the process of word image interpretation, especially 
for candidate selection. It is not surprising that word 
relations at the image level are highly consistent with 
word relations at the symbolic level(see Figure 3). // 
two words hold a relation at the symbolic level and they 
are written m the same font and size, their word images 
should keep the same relation at the image level. And 
also, if two word images hold a relation at the image 
level, the truth values of the word images should have 
the same relation at the symbolic level. In Figure 4, 
word images 2 and 8 must be recognized as the same 
word because they can match each other; the identity 
of word image 5 must be a sub-string of the identity of 
word image 6 because word image 5 can match with a 
part of word image 6; and so on. 

Visual inter-word constraints provide us a way to 
link word images inside a text page, and to interpret 
them systematically. Integrating visual inter-word con- 
straints with a statistical language model and lattice 
parser improves the performance of candidate selection, 
as shown above. 
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A word-collocation-based relaxation algorithm and 
a probabilistic lattice chart parser have been de- 
signed for word candidate selection in degraded text 
recognition([3, 4]). The relaxation algorithm runs it- 
eratively. In each iteration, the confidence score of 
each candidate is upgraded based on its current con- 
fidence and its word collocation scores with currently 
most preferred candidates of its neighboring word im- 
ages. Relaxation ends when all candidates reach their 
stable state. For each word image, those candidates 
with low confidence score will be removed from candi- 
date sets. Then, the probabilistic lattice chart parser 
will be applied to the reduced candidate sets to select 
the candidates which appear in the most preferred parse 
trees built by the parser. There can be different strate- 
gies to use visual inter-word constraints inside the re- 
laxation algorithm and the lattice parser. One of the 
strategies we are exploiting is to re-evaluate the topi 
candidates of the related word images after each itera- 
tion of relaxation or after lattice parsing. If they hold 
same relation at symbolic level, the confidence scores of 
the candidates will be increased. Otherwise, the images 
with low confidence score will follow the decision of the 
images with high confidence score. 

Five articles from the Brown Corpus were chosen ran- 
domly as testing samples. They are A06, G02, J4S, 
NOl and R07, each with about 2,000 words. Given a 
word image, our word recognizer can generate its toplO 
candidates from a dictionary with 70,000 different en- 
tries. In our preliminary experiment, we exploit only 
the iype-1 relation listed in Figure 3. After clustering 
word images by image matching, similar images will be 
in same cluster. Any two images from same cluster hold 
the iype-1 relation. Word collocation data were trained 
from the Penn Treebank and the Brown Corpus except 
five testing samples. Figure 5 shows results of candidate 
selection with and without using visual inter-word con- 
straints. The topi correct rate of candidate lists gener- 
ated by a word recognizer is as low as 57.10%, Without 
using visual inter-word constraints, the correct rate of 
candidate selection by relaxation and lattice parsing is 
83.19%. After using visual inter-word constraints, the 
correct rate becomes 88.22%. 
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Figure 5: Comparison Of Candidate Selection Results 



Integration of natural language processing and image 
processing is a new area of interest in document analy- 
sis. Word candidate selection is a problem we are faced 
with in degraded text recognition, as well as in hand- 
writing recognition. Statistical language models and 
lattice parsers have been designed for the problem. Vi- 
sual inter-word constraints in a text page can be used 
with linguistic knowledge sources to facilitate candidate 
selection. Preliminary experimental results show the 
performance of candidate selection is improved signifi- 
cantly although only one type of inter-word relations 
was used. To fully integrate visual inter- word con- 
straints and linguistic knowledge sources in the relax- 
ation algorithm and the lattice parser is the next step 
to try. 
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